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ABSTBiCT * \ 

Accepting that language diversity i,s functionally 
related to other variables characterizing human sodxeties, lauch 
discussion stems from the advantages or disadvantageous nature of 
, language diversity in terms of national development and national 
unity. To discover ways of measuring language diversity would help, 
in part, to solve the language diversity issue; however^, ♦the lack of 
consistency and agreement in the , definition of the two viewpoints 
hampers the langdag^ planners* Bearing in mind that any- language 
diversity measure takes into consideration all languages present and 
considers the cumbers of users of the languages, the coupling of 
these two independent va^riables renders 'the elimination of all 
ambiguity impossibl^e in^ diversity measure. For example, a society^x 
bearing a large number of ,languages with widely differing numbers of 
users will have the same diversity measurement as one characterized 
by a smaller number of languages but gre^iter ^evenness of user 
distribution. The div^r&ity measurement must be related to both the 
number of languages and tae degree of evenness of user distribution. 
Osing indices and equation models of the research of language 
planners, this document sets out to define properties which a 
language^ diversity index should exhibit, using both samp-Le and census 
data- <Author/CE) 
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THE MEASUREMENT OF LANGUAGE DIVERSITY 



INTRODUCTION 



If IS frequently asserted that language diversity is functionally related to other variables 
charactcnsing human societies. (See, for instance, Greenberg 195ff, Gumperz 1962 Fish- 
man 1968a, 1968b, 1977. Pool 1969, Lieberson & Hansen -1974). A perennial bone of 
(•onU'ntion, for example, has been tfie advantageous or disadvantageous nature of language 
diversity in terms of national development and national unity (Fishman 1968 Deutsch 
1966, Simeon 1972). As Pool (1969) has pointed out, however, the current ability of 
language planners to estimate the relevance of language diversity to development is almost 
ml and one of the main reasons for this is the lack of consistency and agreement in the 
definition of the two concepts involved. The purpose of the present article is not to add 
o he development/diversity debate, but rathep^ suggest reasonable and internally consis- 
tent ways, of meaAinng language diversity, if is hoped that this contribution will enable 
ensuing discussions of the relationship between diversity and other variables to be pursued 
upon the basis of less ambifious, comparable evaluations of this elusive phenomenon. 

Desirable Properties of a Language Diversity Index 

Although Pool (1969) emphasises the wooliness of contemporary "defmitions of the 
concept of language diversity, he continues to employ an apparently arbitrary definition 
in hi.s attempt to clarify the relationship between this variable and national development. 
I he measure used in hLs article was the size of the « largest native language community 
( of fjopulation »)' and variati^fts on this theme have frequently appeared in the literature 
(Hanks & Textor, 1963, Fishman, 1968, Fishman, Cooper and Rosenbaum, 1977 Griper 
and Ladefoged, 1971). Although the most straightforward and unambiguous way of mea- 
suring language diversity is simply the number of languages^ coexisting within a given 
obs«.rvational unit, almost all writers have in fact sought to incorporate numbers or pro- 
portions of language users into their diversity measures. The general consensus of opinion 
would appear to be that the more even the distribution of users among the language cate- 
gories the greater the language diversity of the unit in question. Some writers however 
( haractonse the units under observation as being diverse or homogeneous witliout regard 
to thf- numbers of different languages in use within them. Banks and Textor, (1963) for 
ins ance, define polities ,n which one language is natively spoken by 85% of the population' 
and in which no significant linguistic minority is present as being less diverse than polities 
in which the remaining 15% of speakers may b^' assigned predominantly to a single Ian- 
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guage^. This can give rise to considerable ambiguity in that the members .of the former 
class are actually likely to "exhibit a larger number of languages and yet be classified as 
^ less diverse. ' . • » 

' It is the opinion of the present writer that any language diversity measure must take 
into account first and foremost all the languages present but should also seek to incorporate 
the legitimate concern expressed by most writers for the numbers of users of these langua- 
ges. Although it is -not anticipated that this goal will be disputed, it should be bome^ in 
mind that the coupling of two independent variables, namely number of languages and 
the distribution of speakers among them, renders the elimination of all ambiguity impossible 
in a diversity measure. It is quite possible, for instance, that a society featurinj^ a large 
number of languages, with widely differing numbers of users will have the same diversity 
measurement as a society characterised by a ^mailer number of languages but greater even- 
ness of user distribution. Any diversity measurement should therefore be evaluated in the 
light ©f both the number of languages and the degree of evenness of user distribution 
(see p. 

The desire to incorporate numbers and distributions of speakers in a diversity index 
calls for another caveat at this point. As Fishman (1968b) has pointed out, the demogra- 
phic status of a language does not necessarily coincide with its use and fun(^ons within 
a given society. A fundamental aspect of any demographically based measures of diversity 
is, therefore, that it constitutes a surrogate measure of the diversity of a limited subset of 
the^ total set of linguistic activities. The type of linguistic activity, whose diversity it is 
required to measure should be clearly specified at the outset. It is for this re'ason that the 
more general locution 'language users' is prefered to that'of language speaker' throughout 
lathis theoretical paper. A language may be widely read and therefore used in the context of 
work, for instance, without being spoken. Indices based on numbers or proportions of 
native speakers, on the other hand, tell us something about the diversity of languages spoken 
in the honj^s of a given census or survey unit at sometime in the past, (providing migration 
is allowed for). They cannot be taken to describe diversity of^actual linguistic activity 
outside the home at a later point in time, as Lieberson (1968), fqr example, seems to 
assume in his development of Greenberg's diversity index. As mother tongue data are by 
far the most commonly used for diversity measurement purposes, it must be further em- 
phasised that, since 4iome languages (and therefore mother tongues) vary overtime, evalua- 
tions of diversity rooted in demographic data should not be based on widely differinj^ age 
groups- unless one is simply interested in measuring the mother tongue diversity of a set 
of human beings per se. An example may help to make this clear. A mother tongue based 
measure of the diversity of the population in th^ North West Highlands of Scotland, which 
did not control for age, would produce a reasonably high diversity value. It would be 
erroneous, however, to conclude that this represc-nts the diversity of linguistic ac/JW/y in 
the region's honies at any onij point in time. For the older members of the population 
(>65) the home language was almost exclusively Gaelic whereas for the younger generation 
(<;20), English has been more or less unchallenged as the language of the home. The degree 
of mother tongue diversity among the population as a whole at the.present time is undoub- 
tedly higher than that of linguistic activity among the region's homes at present or indt'^d 
at most periods in the past. Mother tongue diversity then can only he used as a surrogatii^ 

; 

^ Lieberson (1975) has shown that the size of the largest mother tcinf^ue group is in fact an excolleijl 
nonhnear predictor of cine <if the more satisfactory measures of diversity, (ireenberg s A index, defined 
boiow 
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measure for home Unguistic activity within a societal unit, and should ideally be calculated 
using data in which age has been controlled for (age cohorts): 

• 

As stated on page (2) the inclusion of both numbers of language^ and the distribution 
of users of these languages in a diversity measure will inevitably entail a certain degree of 
subjectivity, as the calculated vdlues will depend upon the definition of the relalSnship 
between these two independent vanables. Hurlbert (1971) gives some examples of these 
in his critique of the diversity concept. Nevertheless, certain properties of such a measure 
are clear y desirable and here much can be learned from the work-of ecologists since the 
Second World War. Quantitative ecologists have devoted considerable effort to the develop- 
^ ment and/or application of diversity indices to the species make-up of natural communities 
often with a view to relating species diversity to other community properties such as pro- 
ductivity and stabihty (Pielou 1967). Such concerns are quite analogous to the types of 
problem to which Fishman (1968a), Pool (1969), Lieberson (1974, 1975b)' and other 
scholar^ have ad/essed themselves. WhUe no-one would suggest that languages obey the 
laws of biology the logical problem of measuring diversity is identical in both cases This 
IS exemplified by the fact that the only well known measure of language diversity which 
incorporates both numbers of languages and corresponding numbers of users, namely 
Greenbergs A-mdex (Greenberg, 1956), is formally almost identical to one of the more 
widely used ecological indices of diversity, Simpson's index (Simpson, 1949), although 
both appear to have been developed independently (see below, p. 7), tftnougn 

Pielou (1975) lists three desirable properties for a "diversity index, D, which is to be 
Lr'Si •""l".^^,^ °f categoyies (languages) and the relative frequencies of 

mTv^hi , T^^T'l (P'-°P°'^i°"« °f "«««), D (pi, p2 pg). weinreich 

r 7 also evoked the first two of these properties while Greenberg (1956) has indi- 
cated the desirability of property (3). s v ; nos uiui 

Expressed in sociolinguistifc terms, these may be read as : 



Property I 



r. /,o\^^J'^^" (number of languages), D should take on its greatest value when 
f'\u r". ! "'^'^ apportioned evenly among all the languages present 

w'/r M ""d«^^°"«ide^ation (Note the subjectivity of the relationship between 
these ti;o variables mentioned earlier, ane could conceivably define diversity in exactly 
the opposite way). •' '^-"•"'-"J' 

Property 2 , * , J 

Given two societal units in which users are apportioned evenly among languages, 
one with C languages and one with if . 1 languages, then D should take o^greater vdue 
in the latter case. ^ vcuuc 

Property^ ^ 

Given two societal units characterised by identical distributions of numbers of lan- 
guages and users, then D should tako on a greater value for a unit wherein the observed 
languages belong to different language groups than for a unit wherein the" observeJvi 
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guages belong to a single language group, i.e. the diversity index should take into account 
the hierarchical nature of Uanguage classification and hence the concept of interlingual 
distance. ♦ 

More formally, suppose that language users are subjected to two different classifica- 
tions, namely language group classification, G, with g classes and a language classification, L, 

ivith e classes. Let pj (i = 1, , g,) be the proportion of speakers in the ith class of the G - 

classification and let p^j (i = 1, , g; j = 1, , g) be the proportion of these speakers in 

the jth class of the L classification. Let tt ,^ = p,p,^ be the proportion of the whole communi- 
ty belonging to the ith G class and the jth L class. 

Now, let D (GL) be the diversity of the doubly classified population; D (G) its diversity 
under the language ^oup classification and (L) the diversity under language classification 
of those speakers belonging to- the ith G class. If, in addition, we let Dq (L) be the average 
of the Dj (L) over all G classes, it is then required that 



D(GL) = D(G) + D^(L) 



In addition to allowing for a possible hierarchical classificatioh of languages, property 
3 would provide the possibUity of measuring a population's diversity not only in terms ot 
linguistic criteria but also in terms of kindred criteria or, indeed, totally unrelated critena. 
One could conceive, for instance, of an ethnolinguistic diversity index based on a classifi- 
cation by ethnic affiliation and a classification by mother tongiie. 

Pielou (1969) has shown that the only function of the (Pi, P2, , po/ proportions. 

h'aving these three properties is 



{1) ■ , D(pi, P2,....,pg) - - CS^ PjlogPi 

where C is a positive constant. If C is set equal to 1 we are left with the index 
(2) D = - S ^ pjlogp, 

The diversity of a^d^ibly classified population (property 3) would then be given by 
(Pielou, 1969) ^ 



O) 



where h^i^, ^ - I P.jlogp^^ 



0 ' 
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Diversity Measi%ements Using Sample Data ^ 

Equation (2) is usually referred to as the Shannon index (Shannon and Wiener, 1949) 
and IS generally descnbed as H', the measure of the information content of a code. In the 
language planning context it^ay be interpreted as the diversity per individual in a multi- 
tSt that the Shannbn index as defined/in informati6n 

heoiy IS, stnctly speaking, valid only for an infinite population. It measires the average 
information contamed m a code in the long run rather that the information contained in 
a particular message. It should therefore be used only for societal units that are infinitely 

arge in the sense that removing samples for them causes no perceptible change in them 
IWhen conjplete census data are available diversity may be characterised quantitatively 
by using Brillouin's index (see below).) • Jveiy 

~!lf ^"'.^ "'f °f .th'' ^^^^ bordering on an application to the measurement of lan- 
guage diversity, albeit in a non-sampling context, appears tabe Sadler's (1962) little knowfi 
paper. Sadler uses a different formulation of the Shannon index 



(4) 



1 ^ 

Log I = log N - Z n; lognj 



Tn tuf ^ published examr (es individuals classified according 

!mv. ?f ^^V^^T' ""'"^"'^ °^ ^^"^^ (individuals) in the il^h category (nation 

fnS,' Ji ^resulting value, namely^, iS then interpreted L a JZU the 

cotTn ^K°^^"^^^"°'?^ oflpnference delegates with which he was 

?n T k"/ ^ ""^^ nations.with ecual re presentation would' be equivalent 

to the observed d.stnBtttion WhUe this intem&pa of the resulting figure I a Sid 

ZJ^ I T'"''' ^'f^ '^^^^^ P^^h^P^ unfortunate that it can 

result in higher rneasures of intern ationality (diversity) for organisations with narrower 

rtributL"of° H H- T'^\\'^'"^' ^'"^''^ the attainment of evenness of 

tTn.T" individuals at the expense of category diversity. This contrasts with the 
ecologists approach, where a given diversity measurement is often compared .with the 

buUonTf Ihe^r'"' ' rr^"' ^"^^ -^J"^^^^ categories' J-atdistn 

•LrLs .n. K r^^'''^- the relationship between cate- 

gories and numbers of Items inevitably involves a certam degree of subjectivity, as.men'- 
tigned earlier, it is surely mor^ reasonable to regard the actual number U categories L 
dli' ""T V^' components, and to redistribute inc^v'fduajs among the 

categones, when evaluating diversity for distributions of maximum eveniTpss with which 
betwpTT '^^.^^^^-^^^ty of observed distributions. Lieberson ,(1^69) has state/that ratios 
between diversity measurements and their corresponding, maximum possible values (^iven 
the same number of categories) should not be used for comparisonsbetween units feaSring 
different number of categones. He suggests that such 'startdardfation pr6cedures' ^ve 
misleading results in that the resulting 'standarcjised measures' may he strikingly different 
from those obtained using the basic diversity measures. It musLhe bor;;e in mind^oweve" 
ant,?r"Tr ^P^^^ °^ ^he partition of numbers of us^^. amoflg 

Z eZl^ TL J^.V'^^" alternative diversity indices-but rather measures of 

the evenness of the distnbution ofindividuals among the various langua'ge categories Inter- 



♦ Sadler, p 4H() 
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unit comparisons should idealljL^esort to both indices, as mentioned on page (2) of the 
present article. 

• . ' r 

Maximum diversity for Shannon *s index under conditions of complete evenness with 
2 languages is ■ # • 



'The ratio of the observed diversity value to th^ maximum possible value (assuming the 
same number of individuals and languages) may then be taken as a measure of the degree 
of evenness of the apportionment of users to the various languages 



(6) . • Evenness = 

loge 



According to Poole (1974) equation (2) is a biased estimate of population diversity. 
Where the number of languages present in the unit under consideration is known {as it 
usually will be), the expected value of the observed diversity value, D^^s^ is given by the 
series (Hutcheson 1970). 



Natural logarithms must be used for the calculation of E {D^i^^, the tHird and subsequent 
terms of which are usually very small. Hutcheson (1970) has also shown the variance of 
(Dobs) to be ^ . , ^ 

(8) (^oJ - -.J Pil^^'Pi - (.f^PilnPi)- ^ ^ * 

N - - + + . . . 

The E tDobs) of two different samples can be compared by means of a 't'- test, to see, 
for example, if language diversity is changing over time, 't' viduld be given by (Hutche- 
son 1970) 



D.. - D 

(9) 



> 

< 



V 



r 

and the degrees of freedom for the test by (Ilutchoson 1970) 



l2 



(10) - . ,J-t-J _ , 

■• The usefulness of the pfeceding formulas to languaige pfenners in Countries such as 
Canada, which are seeking to implement multicultural or multilingual policies is readily 
apparent. They permit regular monitoring of the national diversity situation and its evolu- 
tion without rescming to cumbersome, full scale census taking. For a fuller discussion of 
the distributional properties of Shannon's and some other indices of diversity the reader 
IS referred to the 1969 paper by Bowman, Hutcheson, Odum and Shenton. 

In the context of the measurement of language diversity it is perhaps worthwhile • 
mentioning that the Shannon diversity index is a special case of a more general class of 
functions used in the mathematical theory of information. Renyi (1961) has shown that 
3,<|^iven a code of symbols, the function 

(11) " H = '-^^^ ' ^ 

a 1 - a 

■ tha?^ ^"^"^opy of order a of the cod'e. Setting a e^ual to 1 it can be shpwn (Pielou 1975) 



(12) ^^-^^a ^ -^P.logP, = Ds,,„„„„.= H 

. ' * a^ 1 



or the entropy of order. 1 of .the set of the Pj (proportions in category i). With a'= 2 we 
obtain ■ . 



(13) . f , 11.^ = - logZp.2 



The function Sp,^ is the only diversity measure which has so far received any wide- 
spread application in the field of sociolinguistics. Readers will recognize it as GreenbergV ^ 
(1956) A-index which describes the probabUity that any two, randomly picked individuals % 
from a given societal ^unit will share the same mother tongue (or indeed any other Tinguis- 
tic or extra-linguistic feature). 

It is interesting to note that o'reenberg's index is almost identical to that proposed 
by Simpson (1949) as a measure of ecological diversity, namely 
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(14) ^ D 



Zn (n,-l) 



, Simpson. N (N - 1) 

^ <^ • " ' . 

wheretn, = number of individuals in category i and N total of individuals. The differem^l 
is that ^Simpson suggested choosing ^individuals without replacement. Lieberson {19^9) 
has pointed out that essentially similar indices have in fact been developed by workers 
in a variety of other fields.-' As in the case of Greenber^ s index, ecologists tend to follo-w 
^ Pielou*s (1969) recommendation of subtracting the resulting values from unity in order 
to obtain an index which increases with diversity rather Vcyiki uniformity. Thi^ 

(15) • ' D = 1 - Z " 

^ ' '^Simp^.n ^ ^ N(N-l) 



and ^ 

This measure of diversity is widely used by ecologists and has received some -applica- 
tion to census data on mother tongue diversity by Greenberg and later by L'ieberson (1964, 
197.4, 1975a, 1975b). 

Although it can be shown (Pielou 1969) that the Simpson index, and by extension 
the Greenberg index, are formally identical for both fully censused and sampled popula- 
tions, it wUl be recalled that Shajtition's formula (2) is inappropriate for fully censused 
populations (see page 5). When census data are available tojhe investigator an appropriate 
measure of diversity is Brijlouin's index '(Brillouin 1962). The mepprement of the language 
diversity of fully censused societal units is discussed below. 

It is perhaps not superfluous at thif juncture to point out that Hill (1973) has demon- 
strated that different indices measure different aspects of the partition of items among 
categories. They differ in th'^ importance which they .assign to the rarer or more commonly 
used languages respectively. This corresponds to the inevitable subjectivity in diversity 
indices mentioned eEtrlier. He suggests that, rather than taking the logarithms of entropies, 
which are 'harder to visualise', diversity numbers, defined as the reciprocal of the (a- 1)^^ 
root of a weighted mean of the (a-l)*'^^ powers of the proportional abundances of the 
n categories should be used. More formally 



(17) N, =|pj«. + p.; + p^»J i/n-a) ■ 
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where is the diversity number of order a^^^^ would then measure diversity in terms 
simply of the number of languages present- N/ would be exp(Dshannon) and would 
be the reciprocal of Greenberg's index, i.e.l/(pi 2 + p^2 + ... + p,2). As the order of the 
diversity number mcreases the importance assigrfed to the more widely used languages 
would augments 

Hill further claims that is a strictly decreasing function of a and that N^, ^though 
a transformation of Shannon's index, is in no way^ceptional. However, as Pielou (1969 

19T1) s,hows, the only function of the proportions 5| items, (p^,, p.), having property 

(3)^see page 3) is Shannon's formula and it is therefbre to be preferred to Gre^berg's 
index as a more flexible tool for the measurement of the language diversity of sample <lata 
sets when this property is required. 

Diversity Measurenient l^ng Census Data 

Brillouin's information-theoretic index is defined as (Bnllouin 1962) * 

'G ! ■ , 4 . 



(18) . • ..B 



o 



where G is the total niMnbgrof symbols in a code and N^, .. . , N. are the numbers €i#sym- 
bols of each differejjWriifd. Insofar as the measurement of language diversity is concerned 



equation (18) may be reformulated as 



where N is the total number of individuals and Nn is the number of spteakers of the Uh 
language. 

I 

Dg may be interpreted, similarly to equation (2), as the language diversity per indivi- 
dual. Unlike Shannon's index, Db increases as a function of N. This should not beW^wed 
as a drawback, however, since it is not unreasonable to expect lame populations^ be 
more diverse than small ones. 

The use of logarithms ensuros that D„ has the property of additivity, (3) (page 4) in 
addition to properties (1) and (2). In the case of very large valuestjf N Pielou (1969) sug- 
gests the use of Stirling's approximation to the factorial 

In N! - N(ln N - 1) + b 27r N. ^ 

Thus, should it be required to fake into account the hierarchical nature of language 
classification in a highly diverse unit such as India, for which census dat^ were available-, 
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^ one would-iaitially define, diversity qf the populations in terms of their affihation to Ian- 
-(r guage groups as 



(20) 



Dp. = IT log 



N ! 



G N ^ N, ! N„ ! . , . N. ! . . . N ! 



where J might be the number of individuals in the Dravidian group, N2 those m the 
Indo-European group and so-on. * ♦ , • 

Language diversity within the ith language group would then be defined as 



(21) 



N. ! 



" N N' ! N ! . N ! '. N 
' '1 'o I. .i 



where there ye ?i languages in the ith group, N, is the number of individuals in language 
group 1, and N,^ is the number of users of language j. Total diversity would be given by 



. (22) 



N ! 



62 iii ■ 

/r Nlj ! ,r N2j ! Ngj ! 

J = J j= 1 ■ j= 1 



multiplying (22) by 




N, ! . . . N ! 



N, ! . . . N ! 

1 K 



N, ! . . . N ! 



we obtain D,,, -\ logjj,- /^; , 

TT Nl) ! . . . /r Ngj ! 



! ^ 1 



1 g 

1-1 



N ! 

^1 

fei 

TT Nlj ! 

I 



multiplying by ^ we obtain 
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Equation (23) may be extended to take in three (or more) levels of language classification 
such as family, group and language as follows 



f f ^£i N 

In order, that the norwmathematically inclined reader may find his way through this welter 
of subscripts, ^ visual example of the application of (24) to a hypothetical, threefold lan- 
guage classification is shown in figure 1. 

While theyigility to aUow for the hierarchical nature of langu^e classiTication would 
prove useful af a continental or world scale, equation (19)*will probably prove adequate 
m most mstarices. The degree of evenness of the distribution of fully censused individtials 

Tr;f^^u^^ '"'i"'^'^ by calculating Dg^D^^. (Cf. equation 6). Pielou 
(1975, 1977) shows Dgmax to be 



(25) / ' D„ - — log — ^ ^ 

^ ' ■ Bmax M Q , — - 

^ ^ (X!)^--- (Y!)' 

% 

where X = [N/e| or the integer part of the total number of individuals divided by the 
number of languages, and Y = X + 1 , so that N = (fi - r)' X + r Y. Foi^the purposes of census 
language data analysis this expression should of course be. simplified, -entailing no notice- 
able loss of precision, to 

(26) D,- = 4>log 



It will be recalled that the equations based on lirillouin's index are appropriate to 
the measurement of the diversity of fuUy censused populations and that there is therefore 
no need to calculate their standard error. . 



Some Other Approaches to Diversity Measurement 

Pielou (1969) quotes a geometrical interpretation of the concept of diversity intro- 
duced by Mcintosh (1967). Mcintosh suggested that a population consisting of N indivi- 
duals and ^ discreU. categories with N, individuals in the ith category may be interpreted 
as a point in an I! - dimensional space with coordinates (N^, Ng, N., .. Ng) The distance 



> 



12 



of this point from the origin, by Pythagoras' theorem, is 



(27) H ^2 N,^ 

The greater the number of Categories (2) the smaller is the distance, H may therefore be 
interpreted as a measure of the language homogeneity of the population. Umax- ^i'' t>e 
attained when all individuals are assigned to a single category (H - N). H^j„. wUl be attained 
when every individual speakes-a different language (N - ^N). As with the Greenbexg?Simp- 
son index this measure would describe language homogeneity rathet than langualge diversity. 
It is appropriate therrfore ^o take ■ 

* * i 

(28) D = N - H 

I 

as a measure of diversity. Mcintosh further proposed " ^ 
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(29) ■ ' ■ T~ - 

as a measure of diversit^ which is independent of the population size, N. 

t 

Mcintosh also developed a measure of evenness of distribution of individuals among 
categories. Assuming that the number of categories (languages) may be divided into the 
total population exacWy , resulting- in ^ individuals using each language then 



(30) - Hmin for given N, e = 1(^)2 =^ 
The'N - complement of Hmin for given N,t is thus 

(31) D^^^ for given N, 5 = N - N/Zf 

The degree of evenness of a givfen distribution may therefore by measured by 



(32)' •. D N-H 
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Mcintosh's index (27). is somewhat reminiscent of j^eipreich's suggested D 



(33) • D = 1 - Tz^ 

although Weinreich advocated the use of proportions rather than of numberf'of users. 
Wemreich's D is of course also a transformation of Greenberg's A -index; its superiority 
to the latter author's measure is not readily apparent, however, and it has not been em- 
ployed elsewhere to the present writer's knowledge. 

As a final example of the more promising methods of measuring language diversity 
It IS perhaps worthwhile pointing out that,^n some cases, the standard deviation of the 
distribution of numbers of users of each language could constitute a quantitative indication 
of the diversity of a set of languages and their users. It would have to be interpreted with 
care, however, because of the weighting of extreme deviations from the mean which would 
be hkely in this context. Furthermore, it could only be used for the comparison of societal 
units featunng roughly similar numbers of languages. It might, nevertheless be considered 
for the measurement of change in diversity within a given unit over time, and has the ad- 
vantage of being widely known. *. 
<- 

^ Table 1 summarises the definitions and applicability of the various indices reviewed 
above. 

The foregoing section has reviewed some of the more promising ways of measuring 
language diversity which are currently available to the investigator. The following pages 
discuss some of the pitfeflla which may be encountered in seeking to apply suph measures. 



Problems in tl?e Appj^icatiorf of a Diversity Index 

1 . Classification of individuals ""-—^ 

' One difficulty in the application of a diversity index is the unambiguous assignment 
of individuals to a discrete language class. A prerequi.Mte for this, as ^ated on page (2), 
IS a clear statement of the subset of linguistic activities for which a diversity measurement 
IS required, since different languages may be used for different activities. In the case of 
individuals using more then one language for a given activity, additionafmultilingual cate- 
gones could be created (Greenberg, 1956): A further requirement is that languages may 
m fact be subjected to a discrete classification, a situation which has recently been at- 
tained, at least, on a genetic basis (C.F. and P.M. Voegehn, 1977). 

2. The modifiable unit area problem * 

More intractable difficulties are raised by the dolmition of the areal unit for which 
a <|iversity measurement is to be made. Obviously, the measurement of the language diver- 
sity of a set of human beings requires a certain minimum level of aggre^tion. But, as a 
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TABLE 1 



Index 



Definition 



be of data 



Shannon 



Nj N. 



Sample o^ta 



Brillouin , 



1 , 

M log 



N ! 



N ! N. t . . ! Census data 



Simpson/Greenberg 1 - 



Sample or census data 



Mcintosh 



N - N.^ 



Census data 



Stand. Dev. 



S^ple date (or 
census dat^ with 
denominator = 2). 



Weinreich 



1 



Census data 



Nj = Number of users of ilh language, N = total population. ^- number of languages. 



perusal of any modem textbook m theoretical geography will Veveai (e.g. Yeats, 1974, 
Taylor 1977, Harvey, 1969) the parameters of a spatial distribution will change accordmg 
to the level of aggregation at which it is examined, often quite substantially (cf. Lieberson 
and 0*Connor, 1975a). This, .of course, is the areal aspect of the celebrated eeological 
fallacy (Robinson, 1950, Duncan et aL 1961, Scheuch, 1966).. There is not space to go 
into the general problem in detail here, the reader will find a thorough discussion in Duncan 
et aL (1961). Scheuch sums the matter up succintly when he states : 

In the logic of inquiry it does not make any difference whether the basis for grouping 
mdividual units is a territory or some other criterion, what is essential is the effect that 
this criterion has on the control over the i nternal v ariability of units Thus the general 
issue underlying the discussion of the ecologlcaPllillacy is really the relation of the 
criterion, according to which units are grouped, to dye type of inference intended when 
• using the results of aggregated units. ^ - / ' 



^Scheuch (1966), p. 154. 



15 



Language data are u^aUy coUected on .a local areal basi? (census, tracts, counties, 
etc.) and subsequently aggregated to a greater or lesser degree prior to final presentation. 
The question is, at what level should the investigator -work ? In the. case of repeated diversity 
measurements within a single observational unit over time no great difficulty should be 
encountered, providing the unit is a meaningful dne in terms of language diversity Wein- 
reich suggests that « ideally, boundaries should reflect actual communication patterns »6. 
Such a unit will not necessarily Correspond to the administratively determined boundaries 
of most government data collection areas, of course. In the case of inter-unit comparisons, 
however, care must be exercised as such a procedure may mean that the researcher is com- 
j>anng results which were obtained at radically different levels of aggregation. Greenberg 
(1956) pointed this out in his pioneering paper, but his advice has not always been heeded, 
^ particularly insofar as crosspolity comparisons are concerned. Is it meaningful, one may 
ask, to compare the language diversity of Eire (pop. < 3 miU.) with that of the U S S R 
(pop. > 220 mill.) ? ■ ■ ■ ■ 

The modifiable unit area problem is particularly serious when one is interested m 
relating a measure of diversity to other variables. Lieberson and Hansen (1974), for exam- 
ple, examine the relationship between diversity (Greenberg's A-index) and ui-banisation 
m the case of the U.S.S.R. at several points in time. No noticeable correlation emerges 
and It IS concluded that the tw^) variables are nbt related. This is no doubt perfectly true' 
at the scale of the U.S.S.R. It must be borne in mind, however, that the 'linguistically 
diverse regiohs of the U.S.S.R. have not generally coincided with those characterised by - 
high degrees of urbanisation. Percentages of city dweUers at any one time may have tended ' 
to reflect largely the situation in European Russia, while language diversity readmgs may 
have been swollen by the linguistic situation in other areas such as Soviet Central Asia. 
To some extent, therefore, the correlated measurements may not have referred unambig- 
uously to the same groups of mdividuals, and, consequently, that which is valid at the 
continental scale of th'e U.S.S.R. may not hold at the regional scale of, say, Kazakhstan 
/io^-k'*'"^^*'"^ ^ scale, Lieberson, Dalto and Johnston 

(197ob) do report a noticeable, positive diversity /urbanisation correlation within the 
U.S.S.R. V 

It is even more difficult to assess the validity of cross-national (i.e. spatial) correla- 
tions where the levels of aggregation of the units of observation vary wUdly among them- 
selves (e.g. Liechtenstein and India, Luxembourg and Canada, etc.). A number of writers 
including the present author (Brougham, 1969) have viewed the scale phenomenon as a 
feature to be studied in its own right rather than as a problem to be eliminated. The concept 
of language diversity after all, presupposes data aggregation, and its relation to other varia- 
bles implies ^col9gical rather than individual correlations. It is therefore advisable, wherever 
possible, to measure diversity, and hypothesised explanatory variables, at several levels 
of aggregation in order to discover at which scale, if any, variation, order and relation- 
ships exist. Every effort should be made to avoid Spatial correlations of language diversity 
measurements with other variables when units of observation are at totally incompatible 
levels of data aggregation. 
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CONCLUSION 



The preceding pages have presented the properties which one might reasonably expect 
a language diversity index to exhibit, and have reviewed some of the indices which ape 
presently available for the measurement of this phenomenon, using either sample or cerrsus, 
data. It was further suggested that one of the main problems in the measurement of dira< 
sity, and its subsequent correlation with other variables, is the definition of the observa- 
tional units. Ways of attenuating this difficulty are currently being investigated at the 
CIRB and will be presented shortly. It is also planned to apply the more promising of the 
diversity indices reviewed here to the data accumulated at the Centre by Kloss & McConnell 
(1974, 1978, 1979) in their study of the linguistic composition of the nations of the world, 
thereby providing a picture of language diversity acr^ the globe at various scales.* It is 
also hoped that other workers wUl find these indices\f value in terms of the evaluation 
and explanation of the phenomenon of language diversity. 



FIGURE Ij — 

HYPOTHETICAL EXAMPLE OF DIVERSITY MEASUREMENT FOR A THREEFOLD UNGUAGE CLASSIHCATION 

USING BRILLOUIN^ INDEX 
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